Filtering Web Pages with Application Ontologies
نویسندگان
چکیده
Automatically recognizing which HTML documents on the Web contain objects that are “of interest” for a user is non-trivial. As a step toward solving this problem for information filtering in which a user expresses a long-term need for information with a specific profile, we propose an approach based on ontological descriptions. The HTML documents we consider include semistructured HTML documents, HTML tables, and HTML forms. Given the keywords and values and kinds of values recognized by an ontological specification in an HTML document, we apply several heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to the application ontology, (2) an expected-value heuristic that compares the number and kind of values found in the document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machinelearned rules over these heuristic measurements, we determine whether an HTML document contains objects of interest with respect to an application ontology. Our experimental results show that we have been able to achieve about 95% for both recall and precision.
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملDynamic Ontologies on the Web
We discuss the problems associated with managing ontologies in distributed environments such as the Web. The Web poses unique problems for the use of ontologies because of the rapid evolution and autonomy of web sites. We present SHOE, a web-based knowledge representation language that supports multiple versions of ontologies. We describe SHOE in the terms of a logic that separates data from on...
متن کاملOntologies Come of Age
Ontologies have moved beyond the domains of library science, philosophy, and knowledge representation. They are now the concerns of marketing departments, CEOs, and mainstream business. Research analyst companies such as Forrester Research report on the critical roles of ontologies in support of browsing and search for e-commerce and in support of interoperability for facilitation of knowledge ...
متن کامل